A Decoupled Fetch-Execute Engine with Static Branch Prediction Support
نویسندگان
چکیده
We describe a method for supporting static branch prediction on a decoupled fetch-execute pipeline. Using instruction buffers to decouple instruction fetch from the execute pipeline is an effective way to minimize instruction cache penalties by allowing instruction fetch and stall miss handling to proceed independent of the execution pipeline. Dynamic branch prediction is typically used with such architectures, but it is not necessary to assume the cost of dynamic branch hardware when static prediction is sufficient. Traditional static branch prediction approaches were designed for lock-step pipelines and do not adapt well to decoupled fetch-execute pipelines, so alternative means of support were required. We describe the requirements for achieving efficient static branch prediction on a decoupled fetch-execute architecture, and presents the design and results for an implementation on an EPIC-style target architecture.
منابع مشابه
Latency Tolerant Branch Predictors
The access latency of branch predictors is a well known problem of fetch engine design. Prediction overriding techniques are commonly accepted to overcome this problem. However, prediction overriding requires a complex recovery mechanism to discard the wrong speculative work based on overridden predictions. In this paper, we show that stream and trace predictors, which use long basic prediction...
متن کاملA latency-conscious SMT branch prediction architecture
Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because a long-latency operation is being processed, like a memory access or a floatingpoint calculation, the processor can switch to another context so that another thread can take advantage of the idle resources. However, fetch stall conditions cau...
متن کاملOn the Performance of Fetch Engines Running DSS Workloads
This paper examines the behavior of current and next generation microprocessors’ fetch engines while running Decision Support Systems (DSS) workloads. We analyze the effect of the latency of instructions being fetched, their quality and the number of instructions that the fetch engine provides per access. Our study reveals that a well dimensioned fetch engine is of great importance for DSS perf...
متن کاملTolerating Branch Predictor Latency on SMT
Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with...
متن کاملOptimizations Enabled by a Decoupled Front-End Architecture
ÐIn the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the execution core. Attaining these targets is a challenging task due to I-cache misses, branch mispredictio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999